Learning machine based detection of abnormal network performance

ABSTRACT

In one embodiment, techniques are shown and described relating to learning machine based detection of abnormal network performance. In particular, in one embodiment, a border router receives a set of network properties x i  and network performance metrics M i  from a network management server (NMS), and then intercepts x i  and M i  transmitted from nodes in a computer network of the border router. As such, the border router may then build a regression function F based on x i  and M i , and can detect one or more anomalies in the intercepted x i  and M i  based on the regression function F. In another embodiment, the NMS, which instructed the border router, receives the detected anomalies from the border router.

RELATED APPLICATION

The present application claims priority to U.S. Provisional ApplicationSer. No. 61/761,117, filed Feb. 5, 2013, entitled “LEARNING MACHINEBASED DETECTION OF ABNORMAL NETWORK PERFORMANCE”, by Vasseur, et al.,the contents of which are incorporated herein by reference.

TECHNICAL FIELD

The present disclosure relates generally to computer networks, and, moreparticularly, to the use of learning machines within computer networks.

BACKGROUND

Low power and Lossy Networks (LLNs), e.g., Internet of Things (IoT)networks, have a myriad of applications, such as sensor networks, SmartGrids, and Smart Cities. Various challenges are presented with LLNs,such as lossy links, low bandwidth, low quality transceivers, batteryoperation, low memory and/or processing capability, etc. The challengingnature of these networks is exacerbated by the large number of nodes (anorder of magnitude larger than a “classic” IP network), thus making therouting, Quality of Service (QoS), security, network management, andtraffic engineering extremely challenging, to mention a few.

Machine learning (ML) is concerned with the design and the developmentof algorithms that take as input empirical data (such as networkstatistics and states, and performance indicators), recognize complexpatterns in these data, and solve complex problems such as regression(which are usually extremely hard to solve mathematically) thanks tomodeling. In general, these patterns and computation of models are thenused to make decisions automatically (i.e., close-loop control) or tohelp make decisions. ML is a very broad discipline used to tackle verydifferent problems (e.g., computer vision, robotics, data mining, searchengines, etc.), but the most common tasks are the following: linear andnon-linear regression, classification, clustering, dimensionalityreduction, anomaly detection, optimization, association rule learning.

One very common pattern among ML algorithms is the use of an underlyingmodel M, whose parameters are optimized for minimizing the cost functionassociated to M, given the input data. For instance, in the context ofclassification, the model M may be a straight line that separates thedata into two classes such that M=a*x+b*y+c and the cost function wouldbe the number of misclassified points. The ML algorithm then consists inadjusting the parameters a,b,c such that the number of misclassifiedpoints is minimal. After this optimization phase (or learning phase),the model M can be used very easily to classify new data points. Often,M is a statistical model, and the cost function is inverselyproportional to the likelihood of M, given the input data. Note that theexample above is an over-simplification of more complicated regressionproblems that are usually highly multi-dimensional.

Learning Machines (LMs) are computational entities that rely on one ormore ML algorithm for performing a task for which they haven't beenexplicitly programmed to perform. In particular, LMs are capable ofadjusting their behavior to their environment (that is, “auto-adapting”without requiring a priori configuring static rules). In the context ofLLNs, and more generally in the context of the IoT (or Internet ofEverything, IoE), this ability will be very important, as the networkwill face changing conditions and requirements, and the network willbecome too large for efficiently management by a network operator. Inaddition, LLNs in general may significantly differ according to theirintended use and deployed environment.

Thus far, LMs have not generally been used in LLNs, despite the overalllevel of complexity of LLNs, where “classic” approaches (based on knownalgorithms) are inefficient or when the amount of data cannot beprocessed by a human to predict network behavior considering the numberof parameters to be taken into account.

BRIEF DESCRIPTION OF THE DRAWINGS

The embodiments herein may be better understood by referring to thefollowing description in conjunction with the accompanying drawings inwhich like reference numerals indicate identically or functionallysimilar elements, of which:

FIG. 1 illustrates an example communication network;

FIG. 2 illustrates an example network device/node;

FIG. 3 illustrates an example directed acyclic graph (DAG) in thecommunication network of FIG. 1;

FIG. 4 illustrates an example Bayesian network;

FIG. 5 illustrates an example Bayesian network for linear regression;

FIG. 6 illustrates an example learning machine network;

FIGS. 7A-7C illustrate an example learning machine network;

FIG. 8 illustrates an example feature tree;

FIG. 9 illustrates an example learning machine architecture;

FIG. 10 illustrates an example regression graph;

FIG. 11 illustrates an example learning machine architectureimplementation;

FIG. 12 illustrates an example simplified procedure for learning machinebased detection of abnormal network performance in accordance with oneor more embodiments described herein, particularly from the perspectiveof a border router;

FIG. 13 illustrates an example simplified procedure for building aregression function and determining relevant features to use as input tothe regression algorithm in accordance with one or more embodimentsdescribed herein; and

FIG. 14 illustrates an example simplified procedure for learning machinebased detection of abnormal network performance in accordance with oneor more embodiments described herein, particularly from the perspectiveof an network management server (NMS).

DESCRIPTION OF EXAMPLE EMBODIMENTS Overview

According to one or more embodiments of the disclosure, techniques areshown and described relating to learning machine based detection ofabnormal network performance. In particular, in one embodiment, a borderrouter receives a set of network properties x_(i) and networkperformance metrics M_(i) from a network management server (NMS), andthen intercepts x_(i) and M_(i) transmitted from nodes in a computernetwork of the border router. As such, the border router may then builda regression function F based on x_(i) and M_(i), and can detect one ormore anomalies in the intercepted x_(i) and M_(i) based on theregression function F.

In another embodiment, the NMS determines a set of network propertiesx_(i) and network performance metrics M_(i), sends them to a borderrouter of a computer network, and receives, from the border router, oneor more detected anomalies in intercepted x_(i) and M_(i) transmittedfrom nodes in the computer network based on a regression function Fbuilt by the border router based on x_(i) and M_(i).

DESCRIPTION

A computer network is a geographically distributed collection of nodesinterconnected by communication links and segments for transporting databetween end nodes, such as personal computers and workstations, or otherdevices, such as sensors, etc. Many types of networks are available,ranging from local area networks (LANs) to wide area networks (WANs).LANs typically connect the nodes over dedicated private communicationslinks located in the same general physical location, such as a buildingor campus. WANs, on the other hand, typically connect geographicallydispersed nodes over long-distance communications links, such as commoncarrier telephone lines, optical lightpaths, synchronous opticalnetworks (SONET), synchronous digital hierarchy (SDH) links, orPowerline Communications (PLC) such as IEEE 61334, IEEE P1901.2, andothers. In addition, a Mobile Ad-Hoc Network (MANET) is a kind ofwireless ad-hoc network, which is generally considered aself-configuring network of mobile routers (and associated hosts)connected by wireless links, the union of which forms an arbitrarytopology.

Smart object networks, such as sensor networks, in particular, are aspecific type of network having spatially distributed autonomous devicessuch as sensors, actuators, etc., that cooperatively monitor physical orenvironmental conditions at different locations, such as, e.g.,energy/power consumption, resource consumption (e.g., water/gas/etc. foradvanced metering infrastructure or “AMI” applications) temperature,pressure, vibration, sound, radiation, motion, pollutants, etc. Othertypes of smart objects include actuators, e.g., responsible for turningon/off an engine or perform any other actions. Sensor networks, a typeof smart object network, are typically shared-media networks, such aswireless or PLC networks. That is, in addition to one or more sensors,each sensor device (node) in a sensor network may generally be equippedwith a radio transceiver or other communication port such as PLC, amicrocontroller, and an energy source, such as a battery. Often, smartobject networks are considered field area networks (FANs), neighborhoodarea networks (NANs), personal area networks (PANs), etc. Generally,size and cost constraints on smart object nodes (e.g., sensors) resultin corresponding constraints on resources such as energy, memory,computational speed and bandwidth.

FIG. 1 is a schematic block diagram of an example computer network 100illustratively comprising nodes/devices 110 (e.g., labeled as shown,“root,” “11,” “12,” . . . “45,” and described in FIG. 2 below)interconnected by various methods of communication. For instance, thelinks 105 may be wired links or shared media (e.g., wireless links, PLClinks, etc.) where certain nodes 110, such as, e.g., routers, sensors,computers, etc., may be in communication with other nodes 110, e.g.,based on distance, signal strength, current operational status,location, etc. The illustrative root node, such as a field area router(FAR) of a FAN, may interconnect the local network with a WAN 130, whichmay house one or more other relevant devices such as management devicesor servers 150, e.g., a network management server (NMS), a dynamic hostconfiguration protocol (DHCP) server, a constrained application protocol(CoAP) server, etc. Those skilled in the art will understand that anynumber of nodes, devices, links, etc. may be used in the computernetwork, and that the view shown herein is for simplicity. Also, thoseskilled in the art will further understand that while the network isshown in a certain orientation, particularly with a “root” node, thenetwork 100 is merely an example illustration that is not meant to limitthe disclosure.

Data packets 140 (e.g., traffic and/or messages) may be exchanged amongthe nodes/devices of the computer network 100 using predefined networkcommunication protocols such as certain known wired protocols, wirelessprotocols (e.g., IEEE Std. 802.15.4, WiFi, Bluetooth®, etc.), PLCprotocols, or other shared-media protocols where appropriate. In thiscontext, a protocol consists of a set of rules defining how the nodesinteract with each other.

FIG. 2 is a schematic block diagram of an example node/device 200 thatmay be used with one or more embodiments described herein, e.g., as anyof the nodes or devices shown in FIG. 1 above. The device may compriseone or more network interfaces 210 (e.g., wired, wireless, PLC, etc.),at least one processor 220, and a memory 240 interconnected by a systembus 250, as well as a power supply 260 (e.g., battery, plug-in, etc.).

The network interface(s) 210 contain the mechanical, electrical, andsignaling circuitry for communicating data over links 105 coupled to thenetwork 100. The network interfaces may be configured to transmit and/orreceive data using a variety of different communication protocols. Note,further, that the nodes may have two different types of networkconnections 210, e.g., wireless and wired/physical connections, and thatthe view herein is merely for illustration. Also, while the networkinterface 210 is shown separately from power supply 260, for PLC (wherethe PLC signal may be coupled to the power line feeding into the powersupply) the network interface 210 may communicate through the powersupply 260, or may be an integral component of the power supply.

The memory 240 comprises a plurality of storage locations that areaddressable by the processor 220 and the network interfaces 210 forstoring software programs and data structures associated with theembodiments described herein. Note that certain devices may have limitedmemory or no memory (e.g., no memory for storage other than forprograms/processes operating on the device and associated caches). Theprocessor 220 may comprise hardware elements or hardware logic adaptedto execute the software programs and manipulate the data structures 245.An operating system 242, portions of which are typically resident inmemory 240 and executed by the processor, functionally organizes thedevice by, inter alia, invoking operations in support of softwareprocesses and/or services executing on the device. These softwareprocesses and/or services may comprise a routing process/services 244and an illustrative “learning machine” process 248, which may beconfigured depending upon the particular node/device within the network100 with functionality ranging from intelligent learning machinealgorithms to merely communicating with intelligent learning machines,as described herein. Note also that while the learning machine process248 is shown in centralized memory 240, alternative embodiments providefor the process to be specifically operated within the networkinterfaces 210.

It will be apparent to those skilled in the art that other processor andmemory types, including various computer-readable media, may be used tostore and execute program instructions pertaining to the techniquesdescribed herein. Also, while the description illustrates variousprocesses, it is expressly contemplated that various processes may beembodied as modules configured to operate in accordance with thetechniques herein (e.g., according to the functionality of a similarprocess). Further, while the processes have been shown separately, thoseskilled in the art will appreciate that processes may be routines ormodules within other processes.

Routing process (services) 244 contains computer executable instructionsexecuted by the processor 220 to perform functions provided by one ormore routing protocols, such as proactive or reactive routing protocolsas will be understood by those skilled in the art. These functions may,on capable devices, be configured to manage a routing/forwarding table(a data structure 245) containing, e.g., data used to makerouting/forwarding decisions. In particular, in proactive routing,connectivity is discovered and known prior to computing routes to anydestination in the network, e.g., link state routing such as OpenShortest Path First (OSPF), orIntermediate-System-to-Intermediate-System (ISIS), or Optimized LinkState Routing (OLSR). Reactive routing, on the other hand, discoversneighbors (i.e., does not have an a priori knowledge of networktopology), and in response to a needed route to a destination, sends aroute request into the network to determine which neighboring node maybe used to reach the desired destination. Example reactive routingprotocols may comprise Ad-hoc On-demand Distance Vector (AODV), DynamicSource Routing (DSR), DYnamic MANET On-demand Routing (DYMO), etc.Notably, on devices not capable or configured to store routing entries,routing process 244 may consist solely of providing mechanisms necessaryfor source routing techniques. That is, for source routing, otherdevices in the network can tell the less capable devices exactly whereto send the packets, and the less capable devices simply forward thepackets as directed.

Notably, mesh networks have become increasingly popular and practical inrecent years. In particular, shared-media mesh networks, such aswireless or PLC networks, etc., are often on what is referred to asLow-Power and Lossy Networks (LLNs), which are a class of network inwhich both the routers and their interconnect are constrained: LLNrouters typically operate with constraints, e.g., processing power,memory, and/or energy (battery), and their interconnects arecharacterized by, illustratively, high loss rates, low data rates,and/or instability. LLNs are comprised of anything from a few dozen andup to thousands or even millions of LLN routers, and supportpoint-to-point traffic (between devices inside the LLN),point-to-multipoint traffic (from a central control point such at theroot node to a subset of devices inside the LLN) and multipoint-to-pointtraffic (from devices inside the LLN towards a central control point).

An example implementation of LLNs is an “Internet of Things” network.Loosely, the term “Internet of Things” or “IoT” (or “Internet ofEverything” or “IoE”) may be used by those in the art to refer touniquely identifiable objects (things) and their virtual representationsin a network-based architecture. In particular, the next frontier in theevolution of the Internet is the ability to connect more than justcomputers and communications devices, but rather the ability to connect“objects” in general, such as lights, appliances, vehicles, HVAC(heating, ventilating, and air-conditioning), windows and window shadesand blinds, doors, locks, etc. The “Internet of Things” thus generallyrefers to the interconnection of objects (e.g., smart objects), such assensors and actuators, over a computer network (e.g., IP), which may bethe Public Internet or a private network. Such devices have been used inthe industry for decades, usually in the form of non-IP or proprietaryprotocols that are connected to IP networks by way of protocoltranslation gateways. With the emergence of a myriad of applications,such as the smart grid, smart cities, and building and industrialautomation, and cars (e.g., that can interconnect millions of objectsfor sensing things like power quality, tire pressure, and temperatureand that can actuate engines and lights), it has been of the utmostimportance to extend the IP protocol suite for these networks.

An example protocol specified in an Internet Engineering Task Force(IETF) Proposed Standard, Request for Comment (RFC) 6550, entitled “RPL:IPv6 Routing Protocol for Low Power and Lossy Networks” by Winter, etal. (March 2012), provides a mechanism that supports multipoint-to-point(MP2P) traffic from devices inside the LLN towards a central controlpoint (e.g., LLN Border Routers (LBRs), FARs, or “root nodes/devices”generally), as well as point-to-multipoint (P2MP) traffic from thecentral control point to the devices inside the LLN (and alsopoint-to-point, or “P2P” traffic). RPL (pronounced “ripple”) maygenerally be described as a distance vector routing protocol that buildsa Directed Acyclic Graph (DAG) for use in routing traffic/packets 140,in addition to defining a set of features to bound the control traffic,support repair, etc. Notably, as may be appreciated by those skilled inthe art, RPL also supports the concept of Multi-Topology-Routing (MTR),whereby multiple DAGs can be built to carry traffic according toindividual requirements.

Also, a directed acyclic graph (DAG) is a directed graph having theproperty that all edges are oriented in such a way that no cycles(loops) are supposed to exist. All edges are contained in paths orientedtoward and terminating at one or more root nodes (e.g., “clusterheads or“sinks”), often to interconnect the devices of the DAG with a largerinfrastructure, such as the Internet, a wide area network, or otherdomain. In addition, a Destination Oriented DAG (DODAG) is a DAG rootedat a single destination, i.e., at a single DAG root with no outgoingedges. A “parent” of a particular node within a DAG is an immediatesuccessor of the particular node on a path towards the DAG root, suchthat the parent has a lower “rank” than the particular node itself,where the rank of a node identifies the node's position with respect toa DAG root (e.g., the farther away a node is from a root, the higher isthe rank of that node). Note also that a tree is a kind of DAG, whereeach device/node in the DAG generally has one parent or one preferredparent. DAGs may generally be built (e.g., by a DAG process and/orrouting process 244) based on an Objective Function (OF). The role ofthe Objective Function is generally to specify rules on how to build theDAG (e.g. number of parents, backup parents, etc.).

FIG. 3 illustrates an example simplified DAG that may be created, e.g.,through the techniques described above, within network 100 of FIG. 1.For instance, certain links 105 may be selected for each node tocommunicate with a particular parent (and thus, in the reverse, tocommunicate with a child, if one exists). These selected links form theDAG 310 (shown as bolded lines), which extends from the root node towardone or more leaf nodes (nodes without children). Traffic/packets 140(shown in FIG. 1) may then traverse the DAG 310 in either the upwarddirection toward the root or downward toward the leaf nodes,particularly as described herein.

Learning Machine Technique(s)

As noted above, machine learning (ML) is concerned with the design andthe development of algorithms that take as input empirical data (such asnetwork statistics and state, and performance indicators), recognizecomplex patterns in these data, and solve complex problem such asregression thanks to modeling. One very common pattern among MLalgorithms is the use of an underlying model M, whose parameters areoptimized for minimizing the cost function associated to M, given theinput data. For instance, in the context of classification, the model Mmay be a straight line that separates the data into two classes suchthat M=a*x+b*y+c and the cost function would be the number ofmisclassified points. The ML algorithm then consists in adjusting theparameters a,b,c such that the number of misclassified points isminimal. After this optimization phase (or learning phase), the model Mcan be used very easily to classify new data points. Often, M is astatistical model, and the cost function is inversely proportional tothe likelihood of M, given the input data.

As also noted above, learning machines (LMs) are computational entitiesthat rely one or more ML algorithm for performing a task for which theyhaven't been explicitly programmed to perform. In particular, LMs arecapable of adjusting their behavior to their environment. In the contextof LLNs, and more generally in the context of the IoT (or Internet ofEverything, IoE), this ability will be very important, as the networkwill face changing conditions and requirements, and the network willbecome too large for efficiently management by a network operator. Thusfar, LMs have not generally been used in LLNs, despite the overall levelof complexity of LLNs, where “classic” approaches (based on knownalgorithms) are inefficient or when the amount of data cannot beprocessed by a human to predict network behavior considering the numberof parameters to be taken into account.

In particular, many LMs can be expressed in the form of a probabilisticgraphical model also called Bayesian Network (BN). A BN is a graphG=(V,E) where V is the set of vertices and E is the set of edges. Thevertices are random variables, e.g., X, Y, and Z (see FIG. 4) whosejoint distribution P(X,Y,Z) is given by a product of conditionalprobabilities:P(X,Y,Z)=P(Z|X,Y)P(Y|X)P(X)  (Eq. 1)The conditional probabilities in Eq. 1 are given by the edges of thegraph in FIG. 4. In the context of LMs, BNs are used to construct themodel M as well as its parameters.

To estimate the relationship between network properties of a node I (orlink), noted x_(i), (e.g., hop count, rank, firmware version, etc.) anda given networking metric M_(i), a linear regression may be performed.More specifically, given the following equation:M _(i) =F(x _(i))=b ^(T) x _(i)+ε  (Eq. 2)where x_(i) is a d-dimensional vector of observed data (e.g., end-nodeproperties such as the rank, the hop count, the distance to the FAR,etc.) and M_(i) is the target metric (e.g., the time to join thenetwork), which is also noted y_(i) sometimes. Building such a model ofa performance metric knowing a set of observed features is critical toperform root cause analysis, network monitoring, and configuration: forexample the path delay as a function of the node rank, link quality,etc., can then be used to determine whether anomalies appear in thenetwork and thus take some appropriate actions to fix the issue. In theequation (Eq. 2) above, the term ε is a Gaussian random variable used tomodel the uncertainty and/or the noise on the estimate M_(i). The linearregression consists in finding the weight vector b that fulfills themaximum likelihood criterion (which coincides with the least squarecriterion when ε is Gaussian). In particular, the optimal b mustminimize the Mean Squared Error (MSE):MSE=Σ_(i)(b ^(T) x _(i) −y _(i))² /N  (Eq. 3)where N is the total number of input data points, i.e., i=1, . . . , N.

In other words, b is a set of weights for each observed value x_(i),used to compute the function F that provides the value of F. The MSE isa metric used to compute the “quality” of the model function F.

The usual approach to the solving of Eq. (2) is the ordinary leastsquare (OLS) equation, which involves a “d×d” matrix inversion, where dis the number of dimensions. Three main problems arise immediately: (i)the dimensionality of x_(i) may be large, thus making OLS prohibitivelyexpensive in terms of computational cost (approximately O(d³)), (ii) inpresence of co-linearity (i.e., when several node properties arestrongly correlated, as it is the case for the hop count and the ETX,for instance), OLS becomes numerically unstable (i.e., round-off andtruncation errors are magnified, causing the MSE to grow exponentially),(iii) OLS being essentially non-probabilistic (i.e., it doesn't accountfor the whole distribution of its constituent variables, but it merelytracks averages), it cannot cope well with noise and outliers, and it issimply not applicable when ε is not Gaussian.

To overcome these limitations, the problem can be formulated as a BN(see FIG. 5). Now, all variables are considered as random variables,even though they are all observed at this point: both input variablex_(i) and the output variable y_(i) are experimental data, and b is a(non-probabilistic) parameter of the BN at this point. By pushing thisapproach a little bit further, one may turn b into a random variable aswell, and attempt to infer it from experimental data (that is, theobservations of x_(i) and y_(i)). However, this inference problem isnon-trivial, especially as one desirable feature of this learningalgorithm is that it is capable of identifying non-relevantdimensionalities of x (that is, input dimensions that are weaklycorrelated with the output x), and automatically set the correspondingweights in b to a zero (or a very small) value.

This problem is solved by one recently proposed algorithm calledVariational Bayes Least Square (VBLS) regression (Ting, D'Souza,Vijayakumar, & Schaal, 2010). Namely, this algorithm allows forefficient learning and feature selection in high-dimensional regressionproblems, while avoiding the use of expensive and numerically brittlematrix inversion. VBLS adds a series of non-observed random variablesz_(ij) that can be considered as noisy, fake targets of the factorb_(j)·x_(ij), and whose sum Σ_(j)z_(ij) is an estimate of y_(i). Inturn, the weights b_(j) are modeled as random variables, therebyallowing for automated feature detection, i.e., the mean of b_(j)converges rapidly to zero if no correlation exists between the variousx_(ij) and y_(i).

VBLS estimates the distribution of the non-observed variables z_(i) andb using a variant of the Expectation Maximization algorithm with avariational approximation for the posterior distributions, which are notanalytically tractable. Because it is a fully Bayesian approach, VBLSdoes not require any parameterization, except for the initial (prior)distributions of hidden parameters, which are set in an uninformativeway, i.e., with very large variances that lead to flat distributions.

Another critical issue when estimating the mapping between x_(i) andM_(i) is that their relationship may be non-linear. Even in this case,one may use tools from linear regression such as VBLS: instead ofperforming the mapping between the raw data x and M_(i), one mayincrease the dimensionality of the input space by extending it withnon-linear transformations of the input data. These transformations maybe called features, and are noted f_(j)(x). These features f_(j)(x) maybe non-linear functions of one or more dimensions of x. Below are a fewexamples:f _(i)(x)=x _(i)f _(d+1)(x)=x ₁ ·x ₂f _(d+2)(x)=exp(x ₁)f _(d+3)(X)=x ₁ ³f _(d+4)(x)=log(x ₁)In this context, one may rewrite the linear regression as follows:M _(i) =F(x _(i))=Σ_(j) b _(j) f _(j)(x _(i))+ε for j=1,2, . . .  (Eq.4)However, this approach poses one key challenge: there is an infinitenumber of possible features f_(j)(x). As a result, even though VBLS hasthe ability to perform feature selection in an efficient way, theproblem of exploring this infinitely large set of features is yet to besolved. Also, when considering only simply combinations of inputdimension such as f₁(x)=x₁x₂, f₂(x)=x₁ ²·x₂, or f₃(x)=x₁·x₂ ², there isno guarantee that one can construct an accurate mapping F(x_(i)),because there may be a need to incorporate non-integer powers of x(square roots, etc.) or more complex functions such as exp(·), log(·),or even trigonometric functions (e.g., sin(·), cos(·), etc.). This‘catalogue’ of feature ‘type’ needs to be explored in a more or lessintelligent way such that one can construct the most accurate mappingF(x_(i)). Solutions to this problem range from a manual featureselection based on expert knowledge to automated exploration of thesolution space using meta-heuristics.

Currently, techniques consist of: 1) statically configuring the set ofrelevant networking properties to monitor, using a managementinformation base (MIB) with simple network management protocol (SNMP) orCoAP in the case of LLNs in order to monitor the network behavior andperformance (e.g., routing, link loads); 2) retrieving all theinformation on the NMS; 3) analyzing one or more specific networkperformance metrics (referred to as M_(i)) such as the quality ofservice (QoS) or the time for a node n_(i) to join the network; and 4)finding a correlation (e.g., based on 3) between the metric of interestM_(i) and the properties of n_(i) (noted x_(i)). Said differently,current techniques use a centralized approach to perform networkmonitoring and troubleshooting, constructing a model in order toevaluate a performance metric (e.g., the path delay) according to a setof monitored data (routing tree, link reliability, etc.).

Up to several years ago, 4) was performed manually by networkingexperts. With the increase in complexity of existing networks, it becamerequired to use various techniques (analytics) to process a wide rangeof x_(i) and perform correlation between a given set of x_(i) and M_(i).Such correlation is needed in order to build a network performancemetric model, and determine whether M_(i) is normal or abnormal, thusleading to root cause analysis. Note that root cause analysis is one ofthe main challenges in monitoring, troubleshooting, and configuringcomplex networks.

Unfortunately, the approach described above is ill-suited to LLNs;indeed the number of relevant networking properties is very large,making the static approach hard to implement and a “brute force”approach consisting of retrieving all possible x_(i) is simply notpossible because of the very limited bandwidth available at all layersbetween end nodes and the NMS in LLNs/IoT. This makes the current modelnot just ill suited to LLNs but generally not applicable at all. As aresult, one can observe that in currently deployed LLNs, such as shownin FIG. 6 (illustrating an alternate view of network 100), a verylimited number of x_(i) are retrieved by the NMS, making the managementof the network simply not possible (in terms of monitoring,troubleshooting, and even configuration).

The techniques herein, therefore, propose a distributed architecture,relying on distributed Learning Machines (named LM_(d): Learning MachineDistributed) hosted on LBRs/FARs sitting at the fringe between the LLNsand the Field Area network (FAN) in order to build a model for M_(i)using a modified sophisticated linear regression function F(f₁(x_(i)), .. . , f_(m)(x_(i))) where f_(j)(x_(i)) are non-linear function called‘features’ used to build the regression function F. Note that for thesake of illustration, M is the quality of services such as the pathdelay (called Q), but the techniques herein may be applied to a varietyof other metrics such as the time for a node to join a mesh, the PANmigration frequency, etc.

Said differently, the techniques herein make use of a distributedapproach driven by the NMS consisting of using distributed learningmachines hosted by Field Area Routers (FARs) that once informed of thenetwork performance metric of interest locally intercept a set ofnetwork properties in order to build a regression function and detectanomalies. The techniques herein consist in 1) a collaborativeinteraction between the NMS and the learning machines (LMs) to notifythe LMs of the metric of interest M_(i) along with the set of monitorednetwork properties (x_(i)), 2) the interception by the LM of the set ofx_(i) and the metric M_(i) to build the regression function F and thenovel modification of the VBLS algorithm to dynamically compute theoptimal set of features f( ), 3) a technique for guiding the probing ofM_(i) so as to maximize the obtained information, 4) a technique fordetecting anomalies based on the interval of confidence provided byVBLS, and 5) the reporting of the detected anomalies to the NMS.Generally, reference may be made to FIGS. 7A-7C for operation.

Illustratively, the techniques described herein may be performed byhardware, software, and/or firmware, such as in accordance with thelearning machine process 248, which may contain computer executableinstructions executed by the processor 220 (or independent processor ofinterfaces 210) to perform functions relating to the techniquesdescribed herein, e.g., optionally in conjunction with other processes.For example, certain aspects of the techniques herein may be treated asextensions to conventional protocols, such as the various communicationprotocols (e.g., routing process 244), and as such, may be processed bysimilar components understood in the art that execute those protocols,accordingly. Also, while certain aspects of the techniques herein may bedescribed from the perspective of a single node/device, embodimentsdescribed herein may be performed as distributed intelligence, alsoreferred to as edge/distributed computing, such as hosting intelligencewithin nodes 110 of a Field Area Network in addition to or as analternative to hosting intelligence within servers 150.

Operationally, a first component of the techniques herein relates to theinteraction between the distributed learning machine (LM_(d)) hosted onthe LBR (such as a Field Area Router) and the NMS. One of the tasksperformed by the end user consists in configuring the set of networkproperties monitored using the CoAP protocol. Various techniques can beused to minimize the traffic generated for network monitoring in orderfor the NMS to populate its database with the network properties valuesx_(i) (link load, link qualities, routing parameters to mention a few).The second parameter is the network performance M of interest (e.g.,QoS, joining time, PAN migration, etc.), M_(i).

The techniques herein specify a novel unicast IPv6 message used by theNMS to communicate both the set of x_(i) and M_(i); upon receiving theset of x_(i) and M_(i), in contrast with current approaches, the set ofx_(i) are intercepted by LM_(d), thus reducing the overall control planeand network management traffic between the LBR and the NMS since thenetwork properties are effectively consumed by LM_(d).

The second component of this invention is the modification of the VBLSalgorithm briefly described above. As already pointed out, in order tobuild the regression function F required to detect anomalies, the LM_(d)needs to determine the list L_(rel) of relevant features f_(j)(x). Atfirst, L_(rel) is populated with the d basic linear featuresf_(j)(x)=x_(j), j=1, . . . , d, as well as a number of non-linearfeatures that consist of two types of transformations of the raw inputdata: (1) product of various combinations of the input dimensions (e.g.,f(x)=x₁·x2 or f(x)=x₁·x₃) or (2) non-linear functions of the raw input(e.g., f(x)=exp(x₁) or f(x)=sinc(x₁)). In principle, one may also allowfor a mixture of these transformations (e.g., f(x)=exp(x₁·x₂)), or alsoincluding non-linear transformations of linear combinations of inputdimensions (e.g., f(x)=exp(x₁+x₂)). However, for most practicalpurposes, the two first options are sufficient (and it may allow for adramatic reduction of the search space). To generate these features, thetechniques herein use a feature construction (FC) algorithm thatconstructs new features in a random fashion, but tries to favor featuresof lower complexity (i.e., that involve fewer terms). This algorithmwill be described in further detail below.

Once the list of features L_(rel)=[f₁(x), . . . , f_(d)(x)] isdetermined, one may use it as the input to a linear regression algorithmfor determining F(x). Note that d is often very large (in the order ofseveral thousands dimensions or features) and many features may becollinear, thereby precluding the use of conventional linear regressionstrategies such as OLS. Further, the techniques herein aim to determinewhich features are irrelevant to the prediction of M_(i) in order toremove them from L_(rel), and added to a blacklist L_(irr) (asirrelevant). The FC algorithm shall use L_(irr) to restrict its searchspace in future iterations. As stated earlier, the techniques herein mayuse the VBLS algorithm to handle both the very high dimensionality ofthe input space and the presence of multiple collinear dimensions(notably providing an estimate of the relevance of each dimension).

The FC algorithm is a stochastic search algorithm that attempts toconstruct random non-linear features out of the basic input dimensionsx₁, . . . , x_(d).

In particular, a feature can be represented as a tree whose inner nodesare operators and outer nodes (also called leaves) are either constantvalues or input dimensions x₁, . . . , x_(d) (see FIG. 8). Operators arerandomly selected from a user-defined catalogue obtained from the NMS,and they may be unary (non-linear functions as sin( ), sinc( ), exp( ),etc.) or binary (sum, subtraction, product, division, etc.). Thetechniques herein randomly generate features by using a hierarchicalapproach where trees are composed of a single inner node, but leaves maybe other trees. Whenever a new feature must be generated, the techniquesherein pick an operator at random (possibly with some bias for simpleoperators such as the product), and randomly select the operands either(1) from L_(rel), with a probability that is proportional to theirrelevance b_(i), or (2) from randomly generated operators.

The FC algorithm maintains a list of candidate solutions [S₁, . . . ,S_(N)]. Each candidate solution S_(i) is a VBLS instance operating witha list of features F_(i) constructed as above. All candidate solutionsmay be trained with the same raw input data, but each of them uses adifferent set of features. Upon creating a candidate solution, all itsfeatures are added to L_(rel). At each iteration, their relevance (thatis, the value b_(i) computed by VBLS) is updated and the least relevantfeatures are regularly pruned away from L_(rel).

At regular (user-defined) intervals, the fitness of each S_(i) (that is,a score that denotes the quality of S_(i)) is computed. Typically, thetechniques herein use the ratio between the MSE yielded by a purelylinear model and the MSE yielded by S_(i) as fitness. Then, candidatesolutions are randomly replaced by new solutions (generated as describedabove) with a probability that is inversely proportional to theirfitness. Optionally, one may use a so-called ‘elitist’ scheme in whichthe best solution is never replaced. Using such an iterative approach,the techniques herein explore the solution space (by constructing randomnon-linear features) while focusing on promising solutions (by (1)re-using most relevant features from L_(rel), and (2) use of asurvival-of-the-fittest strategy).

The overall approach, generally illustrated in FIG. 9, makes use of aconventional co-evolutionary approach with enhancements andmodifications. Indeed, instead of attempting to evolve the wholeregression function, the techniques herein divide the problem into theevolution of the functions and their building blocks (the features).Because the techniques herein rely on VBLS for determining the optimalweights of the latter, the techniques can achieve an important reductionof the search space as compared to the original approach. First, thealgorithm simply looks for optimal combinations of features, and VBLSplays the role of determining their relevance. Second, because thebuilding blocks are, by definition, simpler than the whole regressionfunction, the corresponding search space is significantly smaller.

A third component of the techniques herein is a strategy for guiding theprobing of M_(i) in a nearly optimal fashion. The techniques herein helpthe FC algorithm to distinguish between the various candidates solutionsS₁, . . . , S_(N). In particular, to this end, the techniques probethose nodes n_(j) that yield the maximal divergence in terms ofprediction of M_(i) among all candidate solutions. More specifically,for each node n_(j) with properties given by x_(j), the techniquescompute the vector M^(i)=[M^(i) ₁, . . . , M^(i) _(N)] composed of theestimates of M for each candidate solution S₁, . . . , S_(N), and theirvariance σ_(i) (optionally, the techniques may compute the weightedvariance that would account for the fitness of each candidate). The nextnode to be probed is the one that maximizes this variance, because it isexpected to disprove as many of the models as possible, and therefore toaccelerate the selection process.

A fourth component of the techniques herein is the anomaly detectionitself, using the computed regression function F. Because VBLS providesan interval of confidence on the estimate M_(i) (see FIG. 10), that is,an interval [M_(i,x%), M_(i,(100-x)%)] that captures the expectedextrema of (100−2·x)% of the nodes, outlier detection can be easilyachieved by verifying that any newly measured metric M_(i) is indeedwithin this interval.

A fifth component of the techniques herein is the specification of anewly defined IPv6 message sent by the LM_(d) of a FAR to the NMS, inorder to provide the computed regression function of M_(i). Such afunction can then be used on the NMS in order to visualize therelationship between various node properties and the metric of interestM_(i). For example, one may observe the dependency between latency andhop count in a typical LLN, or the model can be used to show the directdependency between the end-to-end path reliability and the location ofthe nodes in the network, or the influence of the number of nodes on theoverall latency or jitter. This information can then be used by the enduser to compute the probability of meeting specific SLAs knowing othernetwork attributes (e.g., if the QoS (e.g. path delay) is a function ofthe rank node, link quality and node type; knowing the routing topology,it becomes possible to compute the number of nodes that will be reachedin less than X ms (SLA)). Moreover, the network administrator may beable to detect anomalies, and but take actions to fix performance issuesin the network.

Notably, an overview of the implementation of this architecture isillustrated in FIG. 11. The implementation consists of several logicalcomponents: the Pre-Processing Layer (PPL), the Orchestration Layer (OL)and the Learning Machine (LM) module itself.

1. Pre-Processing Layer: There are two sub-components to thePre-Processing layer. They are the State Tracking Engine (STE) and theMetric Computation Engine (MCE).

-   -   a. The STE runs natively on the FAR. Its responsibility is to        keep track of the various characteristics (such as routing        packets, DHCP packets, node join, parent change etc) of all the        network elements that are visible to it (end points and the FAR        itself). In an example implementation, all these states are        stored in a file and periodically pushed to the MCE. The STE may        be implemented in an existing thread and runs as a part of the        same process that handles packet forwarding.    -   b. The MCE may be implemented on the daughterboard, and may        consist of a server that receives all the information from the        STE over a TCP socket. All this information is then pushed into        a database. The MCE also computes metrics and gathers the        required data from the database using database APIs. These        metrics are then consumed by the LM Algorithms that are running        on the daughtercard. The MCE may be written as a standalone        process that populates the database once it is received over the        socket.

2. Orchestration Layer: This component is responsible for acting as theglue that joins the various components of the PPL and the LM module. Itmay be implemented on the FAR as a new thread and is a part of the sameprocess as the STE. The OL creates two sockets, one that communicateswith the LM module and the other with the MCE. The socket for LM is usedto periodically communicate with the LM. The LM sends periodic requestsover this socket based on which the OL can take actions. The MCE socketis used only for sending information from the STE to the MCE so that thedatabase can be populated for the most up-to-date metrics.

3. Learning Machine Module: The LM module may be executed natively on adaughtercard of the FAR. The LM module is composed of the FC algorithm,which uses a library for random number generation and linear algebraoperations. The FC algorithm illustratively maintains several instancesof VBLS that are constantly fed with data computed by the MCE. Atregular intervals, the FC algorithm evaluates the agreement of itscandidate solutions for various nodes n_(i) and subsequently sendsrequests for QoS probing to the Orchestration Layer through itsdedicated socket.

FIG. 12 illustrates an example simplified procedure 1200 for learningmachine based detection of abnormal network performance in accordancewith one or more embodiments described herein, particularly from theperspective of a border router (learning machine, FAR, etc.). Theprocedure 1200 may start at step 1205, and continues to step 1210,where, as described in greater detail above, the border router receives,from a network management server (NMS), a set of network propertiesx_(i) and network performance metrics M_(i). Accordingly, in step 1215,the border router may then begin intercepting x_(i) and M_(i)transmitted from nodes in a computer network of the border router(and/or probing for M_(i) from nodes n_(j) that yield a maximumdivergence in terms of prediction of M_(i) among all candidatesolutions), as described above. Based on x_(i) and M_(i), the borderrouter may then build a regression function F in step 1220, and may usethe regression function F to detect one or more anomalies in theintercepted x_(i) and M_(i) in step 1225 in a manner detailed above.Optionally, as mentioned above, the border router may report the one ormore detected anomalies to the NMS in step 1230, and/or may report theregression function F to the NMS in step 1235. The procedure 1200illustratively ends in step 1240, though notably with the option toreceive updated x_(i) and M_(i), or to continue intercepting x_(i) andM_(i) and detecting anomalies.

Notably, FIG. 13 illustrates an example simplified procedure 1300 forbuilding the regression function F and determining relevant featuresf_(j)(x) to use as input to the regression algorithm (i.e., to determinea function F(x)) in accordance with one or more embodiments describedherein. The procedure 1300 may start at step 1305, and continues to step1310, where, as described in greater detail above, a plurality offeatures may be generated with a feature construction algorithm thatrandomly pairs operators to one of either constant values or inputdimensions to populate a list with a plurality of features. Bydetermining whether features are irrelevant to prediction of M_(i) basedon a VBLS-based weight of a corresponding feature in step 1315, thosefeatures that are irrelevant to prediction of M_(i) may be removed fromthe list in step 1320. The illustrative procedure 1300 may then end instep 1225 (though notably able to continually update the set of relevantfeatures, accordingly).

In addition, FIG. 14 illustrates an example simplified procedure 1400for learning machine based detection of abnormal network performance inaccordance with one or more embodiments described herein, particularlyfrom the perspective of an network management server (NMS). Theprocedure 1400 may start at step 1405, and continues to step 1410,where, as described in greater detail above, the NMS determines a set ofnetwork properties x_(i) and network performance metrics M_(i), whichmay be sent to a border router of a computer network in step 1415.Accordingly, in step 1420, the NMS should receive, from the borderrouter, one or more detected anomalies in intercepted x_(i) and M_(i)transmitted from nodes in the computer network based on a regressionfunction F built by the border router based on x_(i) and M_(i), in amanner as detailed above. Optionally, as mentioned above, in step 1425the NMS may also receive the regression function F from the borderrouter. The procedure 1400 ends in step 1430, notably with the option toupdate x_(i) and M_(i), and/or to receive detected anomalies or updatedregression functions.

It should be noted that while certain steps within procedures 1200-1400may be optional as described above, the steps shown in FIGS. 12-14 aremerely examples for illustration, and certain other steps may beincluded or excluded as desired. Further, while a particular order ofthe steps is shown, this ordering is merely illustrative, and anysuitable arrangement of the steps may be utilized without departing fromthe scope of the embodiments herein. Moreover, while procedures1200-1400 are described separately, certain steps from each proceduremay be incorporated into each other procedure, and the procedures arenot meant to be mutually exclusive.

The techniques described herein, therefore, provide for learning machinebased detection of abnormal network performance. In particular, thecurrent approaches used to monitor, troubleshoot, and configure networkperformance require to retrieve a number of network properties leadingto a vast amount of control traffic information that is simply notapplicable to LLNs because of their constrained nature (e.g., largeamount of devices, properties, limited bandwidth, etc.). According tothe techniques herein, it becomes possible to build models of variousnetwork metrics and perform anomaly detection in a highly scalablefashion with very limited control plane traffic. Specifically, thetechniques herein enable the FAR to perform predictive analytics of nodemetrics such as QoS or joining times, that is, it can predict the QoS ofa node without probing it, thereby serving as an enabling technology formany other advanced features.

While there have been shown and described illustrative embodiments thatprovide for learning machine based detection of abnormal networkperformance, it is to be understood that various other adaptations andmodifications may be made within the spirit and scope of the embodimentsherein. For example, the embodiments have been shown and describedherein with relation to LLNs and related protocols. However, theembodiments in their broader sense are not as limited, and may, in fact,be used with other types of communication networks and/or protocols. Inaddition, while the embodiments have been shown and described withrelation to learning machines in the specific context of communicationnetworks, certain techniques and/or certain aspects of the techniquesmay apply to learning machines in general without the need for relationto communication networks, as will be understood by those skilled in theart.

The foregoing description has been directed to specific embodiments. Itwill be apparent, however, that other variations and modifications maybe made to the described embodiments, with the attainment of some or allof their advantages. For instance, it is expressly contemplated that thecomponents and/or elements described herein can be implemented assoftware being stored on a tangible (non-transitory) computer-readablemedium (e.g., disks/CDs/RAM/EEPROM/etc.) having program instructionsexecuting on a computer, hardware, firmware, or a combination thereof.Accordingly this description is to be taken only by way of example andnot to otherwise limit the scope of the embodiments herein. Therefore,it is the object of the appended claims to cover all such variations andmodifications as come within the true spirit and scope of theembodiments herein.

What is claimed is:
 1. A method, comprising: receiving, at a borderrouter from a network management server (NMS), a message that includes aset of network properties x_(i) and network performance metrics M_(i),wherein the border router is executing one of a plurality of learningmachines distributed across a plurality of boarder routers;intercepting, by the border router, x_(i) and M_(i) transmitted to theNMS from nodes in a computer network of the border router; building, bythe border router, a regression function F based on x_(i) and M_(i);detecting, by the border router, one or more anomalies in theintercepted x_(i) and M_(i) based on the regression function F; andreporting the one or more detected anomalies to the NMS to cause the NMSto fix performance issues in the computer network.
 2. The method as inclaim 1, wherein the message is a IPv6 unicast message.
 3. The method asin claim 1, further comprising: reporting the regression function F tothe NMS.
 4. The method as in claim 1, wherein building the regressionfunction F comprises: determining relevant features f_(j)(x) to use asinput to a regression algorithm to determine a function F(x).
 5. Themethod as in claim 4, wherein determining relevant features f_(j)(x)comprises: populating a list with a plurality of features; and removingfrom the list those features that are irrelevant to prediction of M_(i).6. The method as in claim 5, wherein populating comprises: generating aplurality of features with a feature construction algorithm thatrandomly pairs operators to one of either constant values or inputdimensions.
 7. The method as in claim 5, further comprising: determiningwhether features are irrelevant to prediction of M_(i) based on aVariational Bayes Least Square (VBLS) based weight of a correspondingfeature.
 8. The method as in claim 1, further comprising: probing forM_(i).
 9. The method as in claim 8, wherein probing comprises: probingfor M_(i) from nodes n_(j) that yield a maximum divergence in terms ofprediction of M_(i) among all candidate solutions.
 10. A method,comprising: determining, by a network management server (NMS), a set ofnetwork properties x_(i) and network performance metrics M_(i); sendingfrom the NMS, a message that includes the set of network propertiesx_(i) and network performance metrics M_(i) to a border router of acomputer network, wherein the border router is executing one of aplurality of learning machines distributed across a plurality of boarderrouters; receiving, from the border router, a report of the one or moredetected anomalies in intercepted x_(i) and M_(i) transmitted to the NMSfrom nodes in the computer network based on a regression function Fbuilt by the border router based on x_(i) and M_(i); and fixing, by theNMS, performance issues in the computer network based on the one or moredetected anomalies.
 11. The method as in claim 10, further comprising:receiving, at the NMS, the regression function F from the border router.12. An apparatus, comprising: one or more network interfaces tocommunicate as a border router with a computer network; a processorcoupled to the network interfaces and adapted to execute one or moreprocesses; and a memory configured to store a process executable by theprocessor, the process when executed operable to: receive, from anetwork management server (NMS), a message that includes a set ofnetwork properties x_(i) and network performance metrics M_(i), whereinthe border router is executing one of a plurality of learning machinesdistributed across a plurality of boarder routers; intercept x_(i) andM_(i) transmitted to the NMS from nodes in the computer network; build aregression function F based on x_(i) and M_(i); detect one or moreanomalies in the intercepted x_(i) and M_(i) based on the regressionfunction F; and report the one or more detected anomalies to the NMS tocause the NMS to fix performance issues in the computer network.
 13. Theapparatus as in claim 12, wherein the message is a IPv6 unicast message.14. The apparatus as in claim 12, wherein the process when executed isfurther operable to: report the regression function F to the NMS. 15.The apparatus as in claim 12, wherein the process when executed to buildthe regression function F is further operable to: determine relevantfeatures f_(j)(x) to use as input to a regression algorithm to determinea function F(x).
 16. The apparatus as in claim 15, wherein the processwhen executed to determine relevant features f_(j)(x) is furtheroperable to: populate a list with a plurality of features; and removefrom the list those features that are irrelevant to prediction of M_(i).17. The apparatus as in claim 16, wherein the process when executed topopulate is further operable to: generate a plurality of features with afeature construction algorithm that randomly pairs operators to one ofeither constant values or input dimensions.
 18. The apparatus as inclaim 16, wherein the process when executed is further operable to:determine whether features are irrelevant to prediction of M_(i) basedon a Variational Bayes Least Square (VBLS) based weight of acorresponding feature.
 19. The apparatus as in claim 12, wherein theprocess when executed is further operable to: probe for M_(i).
 20. Theapparatus as in claim 19, wherein the process when executed to probe isfurther operable to: probe for M_(i) from nodes n_(j) that yield amaximum divergence in terms of prediction of M_(i) among all candidatesolutions.
 21. An apparatus, comprising: one or more network interfacesto communicate as a network management server (NMS) with a border routerof a computer network; a processor coupled to the network interfaces andadapted to execute one or more processes; and a memory configured tostore a process executable by the processor, the process when executedoperable to: determine a set of network properties x_(i) and networkperformance metrics M_(i); send a message that includes the set ofnetwork properties x_(i) and network performance metrics M_(i) to aborder router of a computer network, wherein the border router isexecuting one of a plurality of learning machines distributed across aplurality of boarder routers; and receive, from the border router, areport including one or more detected anomalies in intercepted x_(i) andM_(i) transmitted from nodes in the computer network based on aregression function F built by the border router based on x_(i) andM_(i); and fix performance issues in the computer network based on thereport.
 22. The apparatus as in claim 21, wherein the process whenexecuted is further operable to: receive the regression function F fromthe border router.